Search CORE

311 research outputs found

Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks

Author: Dai Xiaoliang
Jha Niraj K.
Yin Hongxu
Publication venue
Publication date: 26/05/2019
Field of study

Deep neural networks (DNNs) have become a widely deployed model for numerous machine learning applications. However, their fixed architecture, substantial training cost, and significant model redundancy make it difficult to efficiently update them to accommodate previously unseen data. To solve these problems, we propose an incremental learning framework based on a grow-and-prune neural network synthesis paradigm. When new data arrive, the neural network first grows new connections based on the gradients to increase the network capacity to accommodate new data. Then, the framework iteratively prunes away connections based on the magnitude of weights to enhance network compactness, and hence recover efficiency. Finally, the model rests at a lightweight DNN that is both ready for inference and suitable for future grow-and-prune updates. The proposed framework improves accuracy, shrinks network size, and significantly reduces the additional training cost for incoming data compared to conventional approaches, such as training from scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural network architectures derived for the MNIST dataset, the framework reduces training cost by up to 64% (63%) and 67% (63%) compared to training from scratch (network fine-tuning), respectively. For the ResNet-18 architecture derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the corresponding training cost reductions against training from scratch (network fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models contain fewer network parameters but achieve higher accuracy relative to conventional baselines

arXiv.org e-Print Archive

Princeton University Open Access Repository

SCANN: Synthesis of Compact and Accurate Neural Networks

Author: Hassantabar Shayan
Jha Niraj K.
Wang Zeyu
Publication venue
Publication date: 28/03/2021
Field of study

Deep neural networks (DNNs) have become the driving force behind recent artificial intelligence (AI) research. An important problem with implementing a neural network is the design of its architecture. Typically, such an architecture is obtained manually by exploring its hyperparameter space and kept fixed during training. This approach is time-consuming and inefficient. Another issue is that modern neural networks often contain millions of parameters, whereas many applications and devices require small inference models. However, efforts to migrate DNNs to such devices typically entail a significant loss of classification accuracy. To address these challenges, we propose a two-step neural network synthesis methodology, called DR+SCANN, that combines two complementary approaches to design compact and accurate DNNs. At the core of our framework is the SCANN methodology that uses three basic architecture-changing operations, namely connection growth, neuron growth, and connection pruning, to synthesize feed-forward architectures with arbitrary structure. SCANN encapsulates three synthesis methodologies that apply a repeated grow-and-prune paradigm to three architectural starting points. DR+SCANN combines the SCANN methodology with dataset dimensionality reduction to alleviate the curse of dimensionality. We demonstrate the efficacy of SCANN and DR+SCANN on various image and non-image datasets. We evaluate SCANN on MNIST and ImageNet benchmarks. In addition, we also evaluate the efficacy of using dimensionality reduction alongside SCANN (DR+SCANN) on nine small to medium-size datasets. We also show that our synthesis methodology yields neural networks that are much better at navigating the accuracy vs. energy efficiency space. This would enable neural network-based inference even on Internet-of-Things sensors.Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

PinMe: Tracking a Smartphone User around the World

Author: Dai Xiaoliang
Jha Niraj
Mittal Prateek
Mosenia Arsalan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/09/2017
Field of study

With the pervasive use of smartphones that sense, collect, and process valuable information about the environment, ensuring location privacy has become one of the most important concerns in the modern age. A few recent research studies discuss the feasibility of processing data gathered by a smartphone to locate the phone's owner, even when the user does not intend to share his location information, e.g., when the Global Positioning System (GPS) is off. Previous research efforts rely on at least one of the two following fundamental requirements, which significantly limit the ability of the adversary: (i) the attacker must accurately know either the user's initial location or the set of routes through which the user travels and/or (ii) the attacker must measure a set of features, e.g., the device's acceleration, for potential routes in advance and construct a training dataset. In this paper, we demonstrate that neither of the above-mentioned requirements is essential for compromising the user's location privacy. We describe PinMe, a novel user-location mechanism that exploits non-sensory/sensory data stored on the smartphone, e.g., the environment's air pressure, along with publicly-available auxiliary information, e.g., elevation maps, to estimate the user's location when all location services, e.g., GPS, are turned off.Comment: This is the preprint version: the paper has been published in IEEE Trans. Multi-Scale Computing Systems, DOI: 0.1109/TMSCS.2017.275146

arXiv.org e-Print Archive

Princeton University Open Access Repository

SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

Author: Jha Niraj K.
Yu Ye
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/02/2020
Field of study

CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference

arXiv.org e-Print Archive

Princeton University Open Access Repository

MOLECULAR DOCKING STUDIES FOR THE COMPARATIVE ANALYSIS OF DIFFERENT BIOMOLECULES TO TARGET HYPOXIA INDUCIBLE FACTOR-1Î±

Author: Jha Niraj Kumar
Kumar Pravir
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 13/07/2017
Field of study

Objective: Hypoxia plays a significant role in governing many vital signalling molecules in the central nervous system (CNS). Hypoxic exposure has also been depicted as a stimulus for oxidative stress, increase in lipid peroxidation, DNA damage, blood-brain dysfunction, impaired calcium (Ca2+) homoeostasis and agglomeration of oxidized biomolecules in neurons, which act as a novel signature in diverse neurodegenerative and oncogenic processes. On the contrary, the presence of abnormally impaired expression of HIF-1Î± under hypoxic insult could serve as an indication of the existence of tumors and neuronal dysfunction as well. For instance, under hypoxic stress, amyloid-Î² protein precursor (AÎ²PP) cleavage is triggered due to the higher expression of HIF-1Î± and thus leads to synaptic loss. The objective of this research is to perform comparative studies of biomolecules in regulating HIF-1Î± activity based on in silico approaches that could establish a potential therapeutic window for the treatment of different abnormalities associated with impaired HIF-1Î±.Methods: We employed various in silico methods such as drug-likeness parameters namely Lipinski filter analysis, Muscle tool, SWISS-MODEL, active site prediction, Auto Dock 4.2.1 and LigPlot1.4.5for molecular docking studies.Results: 3D structure of HIF-1Î± was generated and Ramachandran plot obtained for quality assessment. RAMPAGE displayed 99.5% of residues in the most favoured regions. 0% residues in additionally allowed and 0.5% disallowed regions of the HIF-1Î± protein. Further, initial screenings of the molecules were done based on Lipinski's rule of five. Cast P server used to predict the ligand binding site suggests that this protein can be utilised as a potential drug target. Finally, we have found Naringenin to be most effective amongst three biomolecules in modulating HIF-1Î± based on minimum inhibition constant, Ki and highest negative free energy of binding with the maximum interacting surface area during docking studies.Conclusion: The present study outlines the novel potential of Biomolecules in regulating HIF-1Î± activity for the treatment of different abnormalities associated with impaired HIF-1Î±

Innovare Academic Sciences: E-Journals

CTRL: Clustering Training Losses for Label Error Detection

Author: Jha Niraj K.
Yue Chang
Publication venue
Publication date: 12/09/2023
Field of study

In supervised machine learning, use of correct labels is extremely important to ensure high accuracy. Unfortunately, most datasets contain corrupted labels. Machine learning models trained on such datasets do not generalize well. Thus, detecting their label errors can significantly increase their efficacy. We propose a novel framework, called CTRL (Clustering TRaining Losses for label error detection), to detect label errors in multi-class datasets. It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways. First, we train a neural network using the noisy training dataset and obtain the loss curve for each sample. Then, we apply clustering algorithms to the training losses to group samples into two categories: cleanly-labeled and noisily-labeled. After label error detection, we remove samples with noisy labels and retrain the model. Our experimental results demonstrate state-of-the-art error detection accuracy on both image (CIFAR-10 and CIFAR-100) and tabular datasets under simulated noise. We also use a theoretical analysis to provide insights into why CTRL performs so well

arXiv.org e-Print Archive

TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference

Author: Jha Niraj K.
Tuli Shikhar
Publication venue
Publication date: 26/03/2023
Field of study

Automated co-design of machine learning models and evaluation hardware is critical for efficiently deploying such models at scale. Despite the state-of-the-art performance of transformer models, they are not yet ready for execution on resource-constrained hardware platforms. High memory requirements and low parallelizability of the transformer architecture exacerbate this problem. Recently-proposed accelerators attempt to optimize the throughput and energy consumption of transformer models. However, such works are either limited to a one-sided search of the model architecture or a restricted set of off-the-shelf devices. Furthermore, previous works only accelerate model inference and not training, which incurs substantially higher memory and compute resources, making the problem even more challenging. To address these limitations, this work proposes a dynamic training framework, called DynaProp, that speeds up the training process and reduces memory consumption. DynaProp is a low-overhead pruning method that prunes activations and gradients at runtime. To effectively execute this method on hardware for a diverse set of transformer architectures, we propose ELECTOR, a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models with high accuracy on the given task and minimize latency, energy consumption, and chip area. The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair while incurring 5.2

\times

lower latency and 3.0

\times

lower energy consumption

arXiv.org e-Print Archive

BIOMOLECULES MEDIATED TARGETING OF VASCULAR ENDOTHELIAL GROWTH FACTOR IN NEURONAL DYSFUNCTION: AN IN SILICO APPROACH

Author: Jha Niraj Kumar
Kumar Pravir
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 01/09/2017
Field of study

Objective: Neurodegenerative diseases are a debilitating age-related disorder manifested by memory loss, impaired motor activity, and loss of muscle tone due to the accumulation of toxic metabolites in the brain. Despite the knowledge of factors causing neurodegenerative disorders, it remains irreversible and incurable. Growing evidence have currently advocated the physiological and pathological contribution of hypoxia-induced vascular endothelial growth factor (VEGF) in neuronal loss. The objective of this research report highlights biomolecules mediated targeting of VEGF activity based on in silico approaches that could establish a potential therapeutic window for the treatment of different abnormalities associated with impaired VEGF.Methods: We employed various in silico methods such as drug-likeness parameters, namely, Lipinski filter analysis, Pock Drug tool for active site prediction, AUTODOCK 4.2.1, and LigPlot1.4.5 for molecular docking studiesResults: Three-dimensional structure of VEGF was generated and Ramachandran plot obtained for quality assessment. RAMPAGE displayed 99.5% of residues in the most favored regions, 0.5% residues in additionally allowed, and no residues in disallowed regions in VEGF, showing that stereochemical quality of protein structure is good. Further, initial screenings of the molecules were done based on Lipinski's rule of five. Finally, we have found Naringenin to be most effective among three biomolecules in modulating VEGF activity based on minimum inhibition constant, Ki, and highest negative free energy of binding with the maximum interacting surface area during docking studies.Conclusion: The present study outlines the novel potential of biomolecules in regulating VEGF activity for the treatment of different abnormalities associated with impaired VEGF

Innovare Academic Sciences: E-Journals

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

Author: Jha Niraj K.
Tuli Shikhar
Publication venue
Publication date: 23/03/2023
Field of study

Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8

\times

smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0

\times

lower energy, and 10.8

\times

lower peak power draw compared to an off-the-shelf GPU

arXiv.org e-Print Archive

DETECTING THE DEPTH OF THE CRACK BY LASER SPOT THERMOGRAPHY

Author: jha niraj
Publication venue
Publication date: 01/01/2020
Field of study

Understanding the properties of the cracks can give detailed insight about the health and status of any structure. Some cracks may not be detrimental, others may cause the structure to collapse if not inspected, recognized and repaired ahead of time. This article is about detecting the depth of a crack using laser spot thermography. A 3D finite element analysis of a laser beam as a heat source and steel specimen with cracks of various depths is performed by using COMSOL Multiphysics 5.5. Then the relationship between the crack depth and the temperature differential index is studied by using a regression analysis. Finally, the equation obtained from the regression analysis is used to predict the depth of the arbitrary cracks. The predicted depth is verified with their actual depths. The results are accurate with the error ranging from +0.3 mm to -0.2 mm

Digital Repository at the University of Maryland